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ABSTRACT 


A new clustering method called CLASSY has been developed, which alternates 
maximum likelihood iteration with a procedure for splitting, combining, and 
eliminating the resulting statistics. The objectives are to maximize the fit 
of a mixture of normal distributions to the observed first through fourth 
central moments of the data and to produce an estimate of the proportions, 
means, and covariances in this mixture. This document describes the mathe- 
matical model which is the basis for CLASSY and the actual operation of the 
algorithm and compares the results of CLASSY with those produced by ISOCLS, 
which currently performs these functions. Simulated and actual LACIE data 
are used in the comparisons. 
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1 . INTRODUCTION 

The Large Area Crop Inventory Experiment (LACIE) is dependent upon clustering 
for the determination of spectral classes within a scene. Currently, the 
Iterative Self-Organizing Clustering System (ISOCLS) is used for this purpose 
{ref. 1). ISOCLS is basically a variation of the k-means or ISODATA algorithm 
of Ball and Hall (ref. 2). Although this algorithm may be interpreted as a 
simplified maximum likelihood procedure, it is fundamentally a heuristic 
algorithm for breaking a data set into fairly homogeneous compact clusters. 

The purpose of this study was to compare ISOCLS as a clustering method with 
a new clustering method called CLASSY J CLASSY operates by alternating 
maximum likelihood iteration with a procedure for splitting, comDining, and 
eliminating the resultant statistics in order to maximize the fit of a mix- 
ture of normal distributions to the observed first through fourth central 
moments of the data. It is based on a formal mathematical model of the data 
as a mixture of multivariate normal distributions. CLASSY produces an esti- 
mate of the proportions, means, and covariances in this mixture. It differs 
from standard maximum likelihood procedures in that it also generates an 
estimate of the number of components of the mixture via the split, combine, 
and eliminate operations. 

Section 2 of this report describes the mathematical model which is the basis 
for CLASSY and provides a brief description of the actual operation of the 
algorithm. The results section (3.3) presents data comparing the performances 
of CLASSY and ISOCLS on simulated data and on actual LACIE data. Finally, 
these results are evaluated, and conclusions and recommendations are developed 
(section 4). 


1 CLASSY was developed by Dr. M. E. Rassbach while he was a National Research 
Council postdoctoral fellow working at the Lyndon B. Johnson Space Center. 
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2. MATHEMATICAL DESCRIPTION 
2.1 ASSUMPTIONS AND PROBLEM DEFINITION 

The fundamental mathematical assumption underlying CLASSY is that the data 
may be represented by a mixture of multivariate normal densities. That is, 
if p denotes probability and x,is an observation vector, 

PteiM = £ a i p 1 .'xJ JLi ,r i ) (1) 

where 

a.j = the a priori probability of occurrence of class i 

p^{x|jJLj ,2^) = the multivariate normal probability density function for 
class 1 

* 

m - the total number of classes 

^ = the vector of parameters 

= V V"’V 

Given a set of unlabeled sample vectors {xJ, we may form the likelihood 
function in the following manner. 



where N = the total number of samples. 

So far, the assumptions and equations parallel the usual maximum likelihood 
development. CLASSY makes the additional assumption that each value of the 
parameters m and £ occurs with an a priori probability A(m»7t_) . The objective 
of CLASSY, then, is to determine the discrete parameter m and the continuous 
parameter vector £ so as to maximize the following function. 
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2.2 SQL* 'T I ON PROCEDURE 

Many approaches may be taken in maximizing equation (6). The approach chosen 
in CLASSY is to interleave the standard maximum likelihood iteration [designed 
to maximize L({x.},m,iiJ with respect to the continuous parameter vector ij 

J 

with a discrete split, join, and combine process [designed to maximize 
L({x.:} .m.n) with respect to the discrete parameter m]. It is expected that, 

J 

by alternating these two techniques, values of m and ^corresponding to at 
least a local maxima of LUxJ.m.Tij will be determined. 

J 
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The maximum likelihood iteration is carried out in the standard manner. The 
data are first scrambled to ensure that a true random sample is obtained. 

This is especially important in the CLASSY algorithm since any correlation 
in the data may cause the maximum likelihood procedure to converge to a very 
poor local minimum or perhaps to fail to converge at all. The initial values 
assumed are 



The data are then examined point by point, and the parameter vector tt_ is 
iteratively adjusted according to the iterative maximum likelihood equations 
which may be expressed as follows. 




a 1 (j)p.}Cx k lii i (j) >s 1 (j)3 


2 ■a i (j)p 1 Cx k |j^(j),l.(j)] 

a i u + 11 Vi T Ji »«)(%*> 

E .(j + 1) = 

2 P<J) C %’i? J)] 


( 8 ) 


( 9 ) 

( 10 ) 
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01 ) 


j\iT 


2 - bU)]^ - £<j)] 

E) U + 1) * 1=1 

E P(ij) Ei|x j! ,H.(j)3 


where 




= the posterior probability of class i on Iteration 
j + 1, given the k th sample vector and value of the 
parameters on the j th iteration 


( j ) , jT|(j), and E^(j) = the values of the parameter: on the j th iteration 


In addition to iterating on these parameters, the program also accumulates 
the third- and fourth-order moments and the logarithm likelihood for each 
cluster. These statistics are computed on a point-by-point basis simul- 
taneously with the parameter iteration. This means that the parameters are 
evolving as the moments and the logarithm likelihood are accumulated; and 
thus, only approximate values are generated. 


As each point is considered, the probability that it belong: to each class is 
computed. These probabilities may be thought of as the fractional part of 
each data point which is assigned to each cluster. These probabilities are 
accumulated as the "weights" for each cluster. When the weight for a given 
cluster exceeds a threshold value, which increases each time it is exceeded, 
the maximum likelihood iteration is stopped; and the program then checks the 
fit of the normal distribution to the data for that cluster. 

The fit of the hypothesized normal distribution to the data for a cluster is 
evaluated by examining the third- and fourth-order moments, which represent 
measures of skewness and kurtosis. The statistics which are generated are 
given by 

S 1 = (Se” 1 S T ) (12) 
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where 

S c the skewness vector 

S-j = a scalar measure of skewness 

S T = transpose of S 

= the inverse covariance matrix 

K-, = Tr'KE -1 ) (13) 

K 2 = TKKe^Ke” 1 ) - ^TrCE^K)] 2 (14) 

where 

K = matrix of kurtosis values 

»Kg, *•* « scalar measures of kurtosis 

In CLASSY, these three statistics are tested against their approximate sam- 
pling distributions computed under the hypothesis that the samples were 
drawn from the normal distribution specified by the current values, of the 
parameters. If any one of these three statistics exceeds the threshold 
value, the cluster is split into two parts. The parameters for each of the 
two new clusters are determined in order to minimize the difference between 
the observed covariance matrix, the skewness vector, and the kurtosis matrix 
and the corresponding quantities for the mixture distribution composed of 
the two new normal distributions. 

Following a split, the parent cluster is not discarded immediately. When 
the maximum likelihood iteration cycle is begun again, it is carried out for 
the previously existing clusters, including the parent cluster and the new 
subclusters (with the new parameters and a weight of 40 points each). Thus, 
a hierarchical structure or cluster tree evolves as this process is repeated. 

At the same time in the processing that a cluster is checked to see if it 
needs to be split, certain other tests are performed. If a cluster has sub- 
clusters (i.e., has been previously split), it is not split again; but the 
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likelihood ratio of the daughter clusters to the parent cluster is examined. 

If this ratio is larger than a given threshold, then the parent cluster is 
eliminated and the daughter clusters take its place. On the other hand, if 
this ratio is too small, the daughter clusters are eliminated in favor of 
the parent. In addition, a cluster may be eliminated if its prior probability 
becomes too small. The program also checks the degree of overlap between 
clusters at the same level in the cluster tree. If the degree of overlap 
is too great and the two clusters are not the only subclusters of a given 
parent cluster, the parameters and other statistics for the two clusters are 
joined. All of these tests allow for periodic restructuring of the cluster 
tree at certain intervals; namely, when the weight {or number of points 
assigned to a given cluster on a frartional probabilistic basis) has accumu- 
lated to a certain point in the maximum likelihood iteration portion of the 
program. 

After tests have been made to determine if a cluster needs to be split or if 
the cluster tree needs to be restructured, the skewness vector and the 
kurtosis matrix for that cluster are reset to zero. The program then con- 
tinues the process of maximum likelihood iteration. If a complete pass 
through the data set is made before a cluster is tested for possible adjust- 
ment, then the values of the means at that time are used in equation (11) 
until another pass through the data set has been completed. 

The program recycles through the data a fixed number of times. The number 
of passes through the data is control, xi by an external parameter. When the 
desired number of passes is complete, the program goes through the data 
point by point and assigns each data point to the cluster in the cluster 
tree for which the probability of occurrence of this data point is the 
greatest. This is the only time in the program that points are assigned 
to clusters. When all of the points have been assigned, a cluster .map show- 
ing the cluster symbol for each point is printed out. The program also 
prints out the final values for the parameters for each cluster in the 
cluster tree. 
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2.3 FLOW DIAGRAM 

This section gives a general flow diagram for the CLASSY program (fig. 1). 
This is not a detailed flow diagram for the program but merely serves to 
summarize the information given in section 2.2 in a convenient manner. 
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3. DATA, PROCEDURES, AND RESULTS 


3.1 DATA SETS 

Two different data sets were used in this study. The first was a set of 
acquisitions of four different LAC IE segments. The second was a set of four 
different simulated acquisitions of a simulated LACIE segment. Each of these 
data sets is described separately in the following paragraphs. 

The four LACIE segments were selected on the basis of the availability of 
ground-truth grid-intersection dots and to provide a representative sampling 
of LACIE segments in terms of field structure and the proportion of wheat 
present. Once the segments had been chosen, the acquisition which had the 
largest Bhattacharyya distance of any of the available acquisitions was 
selected. The segment number, location, acquisition used, and the ground- 
truth percentages of wheat and small grains for each segment are given in 
table 1 . 

The simulated data set consisted of four simulated acquisitions. Each acqui- 
sition was derived first by specifying the mean vector and covariance matrix 
for each of 10 different classes. The class statistics for each class were 
specified so as to simulate the LACIE data for two wheat classes (W-j and Wg), 
two barley classes (B^ and B 2 ), two classes of grass (G 1 and Gg) , two stubble 
classes (S-j and S^), and two classes of fallow (F^ and F 2 ). Once the statis- 
tics were specified, samples were generated from a normal distribution having 
the statistics of a given class. These samples were then placed in rectangu- 
lar fields arranged over the simulated segment. This process was repeated 
for each class and for each of the four acquisitions. The arrangement of 
the simulated fields over the segment was the same for each acquisition. 

The pattern of the simulated fields is given in table 2. 

3.2 EVALUATION METHOD AND PROCEDURES 

CLASSY was evaluated using a comparative analysis method in which the clus- 
tering results of CLASSY were compared with those of ISOCLS using the ground 
truth as a reference. The evaluation procedure followed in three steps. 
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a. The CLASSY and ISOCLS algorithms were applied to each segment in each 
data set. The clustering results were then obtained in line-printer 
cluster-map form. 

by The clusters in each map were labeled first by tabulating the cluster 
symbol and the corresponding ground-truth label (as either wheat or non- 
wheat) for each grid intersection where ground truth was available. 

These results were tabulated, and. the number of ground-truth wheat pixels 
and ground-truth nonwheat pixels falling in each cluster was computed. 

c. The clusters were then labeled wheat or nonwheat by majority rule. 


A measure of the accuracy of each clustering algorithm in separating wheat 
from nonwheat (or a measure of the overall purity of the wheat and nonwheat 
clusters) was computed by estimating the probability of correct classifica- 
tion (PCC) for the labeled clusters. This estimate was computed in the 
following manner. , 


nh 


m^ 


PCC = £ p<o i jo)p(o) + £ P 1 (w i 


|W)P(W) 


(15) 


where 

m^ = number of clusters labeled "other" 

mg = number of clusters labeled wheat 

P(0 1 - 10) = probability that a pixel falls in the i th other cluster, given 

that it is other than wheat 


P i (W i |W) = probability that a pixel falls in the i£7i wheat cluster, given 
that it is wheat 

P(W) = the a priori probability that a pixel is wheat 

P(0) = the a priori probability that a pixel is other than wheat 

If empirical proportions are used to estimate these probabilities and a prioris, 
the resulting expression is as follows. 
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PCC = 


m, 


(16) 


T \i=i 1=1 1 1 / 


where 



= total number of ground-truth pixels 

= number of ground-truth other pixels falling in the i th other cluster 
= number of ground-truth wheat pixels falling in the i th wheat cluster 


It is noteworthy that, to obtain an accurate estimate of PCC using equa- 
tion (16), it is necessary that several ground-truth pixels fall in each 
cluster. Specifically, if there are clusters which have only one or two 
ground- truth grid pixels, the estimate of PCC will be biased on the high 
side. 


As a part of the analysis, the proportion of wheat also was estimated for the 
labeled clusters and compared to the ground-truth value. The equation used 
for chis estimate is 


POO = jj 


m, 


T i=l 



(17) 


where Ny = the total number of ground-truth pixels (wheat and other) falling 
in the i th wheat cluster. 


3.3 RESULTS 

The results of these computations and the acquisitions used are given in 
tables 3 through 12. Tables 3, 4, 6, and 7 compare CLASSY and ISOCLS results 
for the LACIE segments examined; the corresponding results for simulated 
segment data are given in tables 8 through 12. 

Table 3 compares the number of clusters and the PCC estimates for ISOCLS 
(PCCj) and for CLASSY (PCC c ) as a result of clustering each of the four LACIE 
segments examined using both methods. The PCC estimates for CLASSY are, on 


the average, about 4 percentage points lower than those for ISOCLS. However, 
since ISOCLS generates a factor of 4 to 6 more clusters than CLASSY, many of 
the ISOCLS clusters contain only one or two ground-truth grid-intersection 
points. As discussed in section 3.2, this means that the PCC estimates for 
ISOCLS will be biased high relative to CLASSY. In the light of this built- 
in bias, CLASSY compares very favorably to ISOCLS. 

It should be noted that the reduced number of clusters generated by CLASSY 
results in a dramatic increase in the ease with which the cluster maps may 
be interpreted visually. Examples of a portion of the cluster map generated 
by each algorithm are given in figures 2 and 3. 

The LACIE segments used in this study contained varying amounts of wheat. 

The ground-truth percentages of wheat [P(W}] and small grains [P(SG)] are given 
in table 4. The estimate of the proportion of wheat computed using the ground- 

A 

truth grid-intersection dots [P D (W)] is also included. An estimate of the 
proportion of wheat from the ground-truth labeled clusters can be obtained 
using equation (17). The wheat proportion estimates resulting from applying 
this equation to the CLASSY results (D c ) and ISOCLS results (Dj) are also 
given in table 4. Comparing these percentages to the ground -truth wheat 
proportions shows that with the exception of segment 1965 the wheat proportion 
estimates are about 4 to 6 percent higher than the ground-truth wheat propor- 
tion values. These slightly high estimates may be due to the fact that, even 
though only wheat ground-truth dots were used to label clusters, labeled 
wheat clusters may reasonably be assumed to include some small grains. The 
last column in table 4 shows that the ISOCLS estimate was closer to the 
ground-truth wheat proportion for two segments and the CLASSY estimate was 
closer for the other two segments. 

The imagery for segment 1965 was examined in detail because the wheat propor- 
tion estimates for both CLASSY and ISOCLS deviated considerably from the 
ground truth and the PCC estimates for both algorithms were correspondingly 
low for this segment. This segment contained numerous small strip fields. 
Typically, small-fields regions accentuate misregistration problems, which 
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appear to he the case for this segment. The misregistration of the ground- 
truth reference acquisition relative to the acquisition clustered reduced PCC 
values and distorted the proportion of wheat estimates for both algorithms. 

In order to obtain an idea about the relative performance of CLASSY and ISOCLS 
when applied to multitemporal data, four-channel green images were formed for 
each segment by applying the Kauth transformation to each of four acquisitions 
for a given segment and then selecting the*.green channel from each acquisi- 
tion. It was necessary to reduce the 16-dimensional data to 4 dimensions 
since CLASSY is limited to 4 dimensions at. the present time. Table 5 lists 
the four acquisitions used for each segment. The results of comparing the 
PCC values and the wheat proportion estimates for the two algorithms are 
given in tables 6 and 7, respectively. Comparing table 6 and table 3 shows 
that the PCC values for both algorithms remained about the same for segments 
1181 and 1961 and that they increased significantly for segments 1*58 and 
1965. The average difference between the CLASSY and ISOCLS PCC values 
remained about 4 percent. However, the CLASSY PCC equaled the ISOCLS PCC for 
segment 1988, and the difference was very small for segment 1961. The last 
column of table 7 shows that, when the four-channel green images were used, 
the wheat proportion estimates from the CLASSY clusters were closer to the 
ground-truth values than were the ISOCLS estimates in every case. 

Tables 8 and 9 are analogous to tables 3 and 4, except that they give the 
results for the single-pass simulated data. The column labeled maximum 
likelihood PCC (PCC^) gives the overall PCC when using Standard maximum like- 
lihood parameter estimates and classification with the number of classes 
known. Note that the PCC estimates for CLASSY were higher than those for 
ISOCLS in two of the four passes. In fact, on pass 2, where the separability 
was greatest, the PCC for CLASSY equaled the maximum likelihood PCC. On the 
average, the PCC for CLASSY was 1.4 percent higher than that for ISOCLS. 

The. proportion estimate computed from the labeled clusters is given in 
table 9. Again, the estimate from CLASSY was closer to the true value in 
two of the four passes. However, the average individual ISOCLS estimate was 
about 2 percent closer to the true value. 
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The results for the simulated data using band 1 from each of the four passes 
are given In table 10. Band 1 was selected arbitrarily to assess the use of 
multi temporal data. Note that the PCC estimate for CLASSY was 1.0, meaning 
that none of the CLASSY clusters contained a mixture of wheat and nonwheat 
grid intersection dots. 

Using the simulated data makes it possible to identify a cluster with a cer- 
tain class In the’data by determining which class contributes the majority 
of pixels to the cluster. After such an identification, the generating 
statistics for the subclass may be compared with the cluster statistics pro- 
duced by CLASSY. Table 11 presents the results of such a comparison for the 
pass 2 simulated data, whereas table 12 gives similar results for the cluster- 
ing using band 1 from each of the four passes. 

* r 

In the pass 2 CLASSY results, four of the five clusters could be clearly Iden- 
tified with one of the generating classes or distributions. A comparison of 
the mean vector and covariance matrices shows a remarkable correspondence 
between the CLASSY statistics and the generating statistics. Cluster 3 was 
about equally divided between grass 1 and grass 2. The statistics for grass 1 
are given. Similarly, cluster 2 is a mixture of stubble, fallow, and barley 2. 
The statistics for each of these classes are very similar for this pass. The 
statistics for stubble 1 are given as a representative example. 

The data from band 1 of each of the four simulated passes had more separability; 
thus, CLASSY was able to distinguish more classes. The comparison of the 
generating statistics and the CLASSY statistics is presented in table 12. 

Only the variance terms from the multipass covariance matrix were available. 
Again there is remarkable correspondence between the CLASSY statistics and 
the generating statistics. 


TABLE 1.- DESCRIPTION OF LAC IE SAMPLE SEGMENTS 


Seqment. 

Location 

Acquisition 

Ground 

truth, 

% wheat 

Ground 

truth, 

2 small 
grains 

1181 

Kans. 

76070 

23.4 

29.0 

1988 

. Kans. 

75312 

33.0 

33.0 

1961 

Kans . 

76200 

8.2 

8.2 

1965 

N. Dak. 

76221 

41.6 

47.0 


TABLE 2.- DISTRIBUTION OF CLASSES IN SIMULATED SEGMENT 
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TABLE 3.- COMPARISON OF THE NUMBER OF CLUSTERS AND THE ESTIMATED PROBABILITY 
OF CORRECT CLASSIFICATION USING SINGLE-PASS SEGMENT DATA 


Segment 

ISOCLS 

CLASSY 

pcc ( .-pcc i 

Number of 
clusters 

PCCj 

Number of 
clusters 

pcc c 

1181 

40 

0.8410 

7 

0.8052 

-0.0358 

1988 . 

40 

.8070 

8 

.7661 

-.0409 

1961 

40 

.9236 

11 

.9028 

-.0208 

1965 

40 

.7419 

9 

.6774 

-.0645 

Average 

40 

.8284 

8.75 

.7875 

-.0405 
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TABLE 5.- ACQUISITIONS USED IN CREATING FOUR-CHANNEL GREEN IMAGES 


Segment 

1181 


1988 


1961 


1965 


Acquisition s 

76070 

76107 

76124 

76196 

75293 

76127 

76164 

76272 

75227 

76164 

76236 

76254 

76132 

76203 

76221 

76258 
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TABLE 6.- COMPARISON OF THE NUMBER OF CLUSTERS AND THE ESTIMATED PROBABILITY 
OF CORRECT CLASSIFICATION USING THE FOUR-CHANNEL GREEN IMAGE DATA 


Segment 

ISOCLS 

CLASSY 

p CC c - p CCi 

Number of 
clusters 

PCCj 

Number cf 
clusters 

pcc c 

1181 

40 

0.8667 

4 

0.8000 

-0.0667 

1988 

40 

.9357 

16 

.9357 

0 

1961 

40 

.9167 

23 

.9097 

-.0070 

1965 

40 

.8065 

13 

.7290 

-.0775 

Average 

40 

.8814 

14 

.8436 

-.0378 


3-11 











TABLE 7.- COMPARISON OF WHEAT PROPORTION ESTIMATES FOR LABELED CLUSTERS 

USING FOUR-CHANNEL GREEN IMAGE DATA 


u 

o 

• 

M 

o 

5.1 

0.2 

0.3 

6.0 

2.9 

N 1 

O 

O 2 


2* 

• Q. 

H 1 

•-» 

o 2 

<-> 

•a. 

5.8 
-1.4 
-1.6 
20.9 

5.9 

to 

lxo. 

24.1 

34.2 
6.9 

56.5 

30.4 

ISOCLS 

2 

tQ_ 

29.2 

31.6 

6.6 

62.5 

32.5 





Ground truth 

s 

qT 



23.0 

33.0 
8.2 

47.0 
29.3 

P(W) 

23.4 

33.0 

8.2 

41.6 

26.6 

Segment 

1181 

1988 

1961 

1965 

Average 
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TABLE 8.- COMPARISON OF THE NUMBER OF CLUSTERS AND THE ESTIMATED PROBABILITY OF 
CORRECT CLASSIFICATION USING SINGLE-PASS SIMULATED DATA 
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TABLE 9.- COMPARISON OF THE WHEAT PROPORTION ESTIMATES FOR LABELED 
CLUSTERS USING SINGLE-PASS SIMULATED DATA 


Pass 

P(W) 

Pj(W) 

P C (W) 

. D r. 

Pj(W)-P(W) 

. v. 

P C (W>-P(H) 


1 

0.3398 

0.3301 

0.2536 

-0.0097 

-0.0862 

•:.0765 

2 

.3398 

.321 4 

.3541 

-.0144 

.0143 

.0001 

3 

.3398 

.3636 

.2917 

.0238 

-.0481 

-.0243 

4 

.3398 

.3254 

.3345 

-.0144 

-.0045 

.0095 

Average 

.3398 

.3361 

.3U.-5 

-.0147 

-.0312 

-.0228 


TABLE 10.- PROBABILITY OF MISCLASSIFICATION USING MULTIPASS 

SIMULATED DATA 



ISOCLS 

CLASSY 


Data 

Number of 

PCCj 

Number of 

PCC 

c 

PCC c -PCCj 


clusters 

clusters 


Band 1 from 
each of 4 
passes 

40 

0.9809 

** 

i 

1.0000 

0.0191 
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TABLE 11.- COMPARISON OF CLUSTER STATISTICS FOR PASS 2 SIMULATED DATA 
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TABLE 12.- COMPARISON OF CLUSTER STATISTICS FOR BAND 1 FOR EACH OF FOUR PASSES 

OF THE SIMULATED UATA 


Cluster 

number 


Generating statistics 

CLASSY statistics 

Identification 

Mean 

vector 

Covariance matrix 

Mean 

vector 

Covariance matrix 


S 

Wheat 1 

p6.9f 


1.06 



1 

'26.84' 


1.27 

0.69 

1.42 

1.611 



20.36 



0.91 




20.27 


.59 

1.21 

1.25 

1.6? 



17.39 




2.15 



17.22 


1.42 

1.25 

2.32 

2.65 



1.17.27, 


• 



IF 1 

MM 


L17.02. 


.1.61 

1.62 

2.65 

3.49. 

2 

Wheat 2 

'25.79 


1.03 



« 


. 


1.22 

0.94 

0.78 

0.99 



18.55 



0.82 




18.76 


.94 

1.23 

.78 

.87 



16.85 




0.47 



16.88 


.78 

.78 

.85 

.67 



.18.R 


m 



1 .76, 


.17.97. 


. .96 

.87 

.67 

1 .80. 

4 

Barley 1 

'28. 4f 


■2.16 





'28.4(T 


2.30 

1.56 

3.03 

2.19 



23.30 



4.86 




22.71 


1.56 

1.81 

2.69 

2.17 







4.15 



22.56 


3.03 

2.69 

5.33 

3.80 



.17.01. 


. 



4.47. 


b 7 . 44J 


2.18 

2.17 

3.86 

3.58. 

3 

Barley 2 

'28.29 


1.33 



- 




1.63 

CO 

1.79 

1.09 



22.76 



0.77 




22.71 


-.08 

.79 

-.40 

-.09 



22.37 




1.88 



22.56 


1.79 

-.40 

2.54 

1.23 



J7.34J 


• 



1.61. 


L17.44J 


J .05 


1.23 

1.86. 

1 

Grass 1 

'25.6/1 


n.ai 





[25.82] 


[2.69 

0.87 

1.76 

2.1/] 


(grass 2, 
stubble 1) 

20.83 



1.31 




21.20 


.87 

1.39 

.74 

.98 



20.10 




1.80 



20.35 


1.76 

.74 

1.71 

1.65 



L 2 O. 60 J 


. 



1.62. 


[20.72J 


b.17 

.98 

1.65 

2.43J 

6 

Fallow 1 

'24.59' 


[0.67 





*24.601 


D.75 

0.38 

0.42 

0.48 



22.48 



0.52 




22.45 


.38 

.72 

.68 

.09 



23.22 




0.90 



23.21 


.42 

.68 

1.06 

.04 



.21 .56. 


- 



.66. 


l21.67. 


. .48 

.09 

.04 

.76] 

7 

Stubble 2 

r24.33] 


'1.17 



1 

[24.34] 

1 

1.31 

0.38 

-0.01 

-0.141 


(fallow 2) 

22.21 



0.67 




22.25 


.38 

.86 

.09 

-.15 



22.69 




0.74 





-.01 

.09 

1.01 

.84 



L 28 . 63 J 





1.04. 


28.63, 


.-•14 

-.15 

.84 

1.36 
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Figure 2.- Example of the ISOCLS cluster map - segment 1181. 
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Figure 2.— Concluded 
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Figure 3.— Example of the CLASSY cluster map — segment 1181 
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Figure 3.- Concluded. 


4. CONCLUSIONS AND RECOMMENDATIONS 


4.1 CONCLUSIONS 

The main conclusion of this study is that the performance of the CLASSY 
clustering algorithm compares tuvorably with ISOCLS on both the real and 
simulated LACIE segment data. In terms of performance, these results were 
obtained despite the fact that CLASSY reduces the number of clusters by a 
factor of 4 to 6 as compared to ISOCLS. This would indicate that CLASSY is 
indeed approximating the empirical mixture density rather than just breaking 
up the data space into small homogeneous areas as does ISOCLS. This conclu- 
sion is further substantiated by noting the high degree of correspondence 
between the CLASSY cluster statistics and the generating statistics of classes 
in the simulated data. It appears that the CLASSY algorithm may well provide 
a solution to the fundamental problem of maximum likelihood clustering - the 
determination of the inherent number of classes in the data. 

A detailed examination of the results indicates that, in general, the PCC 
estimates for ISOCLS were slightly higher than those for CLASSY. (However, 
CLASSY did actually have higher PCC estimates on two of the simulated data 
parses.) It should be remembered in viewing these results that, because 
ISOCLS had many more clusters than CLASSY, there were always ISOCLS clusters 
which contained only one or two ground-truth dots. As discussed in sec- 
tion 3.2, this tends to bias the PCC estimate for ISOCLS on the high side. 

The wheat proportion estimates for both CLASSY and ISOCLS were comparable. 
Again, ISOCLS is usually a little closer to the ground-truth value. However, 
the proportion estimates are also biased when the clusters are mixed. So, 
again, it is to be expected that ISOCLS, with its larger number of clusters, 
would generate better estimates. The fact that the estimates are only 
slightly better and sometimes worse indicates again that CLASSY is determin- 
ing the distributional structure- of the data. 

Finally, it should be noted that ISOCLS typically requires 3 to 5 minutes to 
process a real LACIE segment; whereas CLASSY, iterating through the data three 
times, typically requires 9 to 16 minutes of central processing unit time. 
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A. 


4.2 RECOMMENDATIONS 

On the basis of these tests* it Is recommended 

a. That further tests be conducted using CLASSY, particularly on multiple- 

pass LACIE data < 1 

b. That the CLASSY program be completely documented, Including the revision 
of certain parts of the program to improve the performance or speed of 
the algorithm 

c. That methods for incorporating the CLASSY algorithm into LACIE Procedure 1 
be developed and tested 


r 
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